AITopics | contextual bandit problem

Collaborating Authors

contextual bandit problem

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

082e82cae0232f45f27fdd2612c31f8a-Paper-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 10:30:51 GMT

artificial intelligence, data mining, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.68)

Genre: Research Report (0.68)

Industry: Information Technology (0.46)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

Fairness in Learning: Classic and Contextual Bandits

Matthew Joseph, Michael Kearns, Jamie H. Morgenstern, Aaron Roth

Neural Information Processing SystemsApr-22-2026, 12:12:56 GMT

We introduce the study of fairness in multi-armed bandit problems. Our fairness definition demands that, given a pool of applicants, a worse applicant is never favored over a better one, despite a learning algorithm's uncertainty over the true payoffs. In the classic stochastic bandits problem we provide a provably fair algorithm based on "chained" confidence intervals, and prove a cumulative regret bound with a cubic dependence on the number of arms. We further show that any fair algorithm must have such a dependence, providing a strong separation between fair and unfair learning that extends to the general contextual case. In the general contextual case, we prove a tight connection between fairness and the KWIK (Knows What It Knows) learning model: a KWIK algorithm for a class of functions can be transformed into a provably fair contextual bandit algorithm and vice versa. This tight connection allows us to provide a provably fair algorithm for the linear contextual bandit problem with a polynomial dependence on the dimension, and to show (for a different class of functions) a worst-case exponential gap in regret between fair and non-fair learning algorithms.

artificial intelligence, data mining, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.93)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Online Learning with an Unknown Fairness Metric

Stephen Gillen, Christopher Jung, Michael Kearns, Aaron Roth

Neural Information Processing SystemsMar-15-2026, 07:35:17 GMT

We consider the problem of online learning in the linear contextual bandits setting, but in which there are also strong individual fairness constraints governed by an unknown similarity metric. These constraints demand that we select similar actions or individuals with approximately equal probability [?], which may be at odds with optimizing reward, thus modeling settings where profit and social policy are in tension. We assume we learn about an unknown Mahalanobis similarity metric from only weak feedback that identifies fairness violations, but does not quantify their extent. This is intended to represent the interventions of a regulator who "knows unfairness when he sees it" but nevertheless cannot enunciate a quantitative fairness metric over individuals. Our main result is an algorithm in the adversarial context setting that has a number of fairness violations that depends only logarithmically on T, while obtaining an optimal O( T) regret bound to the best fair policy.

algorithm, artificial intelligence, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.68)

Industry: Education > Educational Setting > Online (0.61)

Technology:

Information Technology > Data Science (0.94)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Overleaf Example

Neural Information Processing SystemsFeb-16-2026, 22:47:32 GMT

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > Netherlands > South Holland > Delft (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.47)

Add feedback

Overleaf Example

Neural Information Processing SystemsFeb-16-2026, 22:47:29 GMT

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > Netherlands > South Holland > Delft (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.47)

Add feedback

Doubly-Robust Lasso Bandit

Gi-Soo Kim, Myunghee Cho Paik

Neural Information Processing SystemsFeb-14-2026, 10:07:11 GMT

While therewardcompensation mechanism isunknown,the learner can adapt his (her) decision to the past reward feedback so as to maximize the sum of rewards.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.34)

Add feedback

Differentially Private Contextual Linear Bandits

Roshan Shariff, Or Sheffet

Neural Information Processing SystemsFeb-13-2026, 23:43:14 GMT

The objective is to maximize cumulative reward byexploring the actions to discover optimal ones (having the best expectedreward),balancedwithexploitingthem.

artificial intelligence, big data, data mining, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > New York (0.04)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
North America > United States > Virginia > Arlington County > Arlington (0.04)
(2 more...)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.48)

Add feedback

Online Learning with an Unknown Fairness Metric

Stephen Gillen, Christopher Jung, Michael Kearns, Aaron Roth

Neural Information Processing SystemsFeb-12-2026, 20:02:10 GMT

We therefore assume that the algorithm has access to an oracle that knows intuitively what it means to be fair, but cannot explicitly enunciate the fairness metric.

algorithm, artificial intelligence, machine learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Spain > Andalusia > Granada Province > Granada (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.47)

Add feedback

Contextual Bandits with Cross-Learning

Santiago Balseiro, Negin Golrezaei, Mohammad Mahdian, Vahab Mirrokni, Jon Schneider

Neural Information Processing SystemsFeb-12-2026, 11:56:36 GMT

In the classical contextual bandits problem, in each roundt, a learner observes some contextc, chooses some actiona to perform, and receives some reward ra,t(c).

artificial intelligence, data mining, machine learning, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.77)
Information Technology > Game Theory (0.48)
Information Technology > Artificial Intelligence > Machine Learning (0.48)

Add feedback

Filters

Collaborating Authors

contextual bandit problem

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

d54e440c92affd396117e161bbab5e78-Paper-Conference.pdf

082e82cae0232f45f27fdd2612c31f8a-Paper-Conference.pdf

Fairness in Learning: Classic and Contextual Bandits

Online Learning with an Unknown Fairness Metric

Overleaf Example

Overleaf Example

Doubly-Robust Lasso Bandit

Differentially Private Contextual Linear Bandits

Online Learning with an Unknown Fairness Metric

Contextual Bandits with Cross-Learning